***Merging candidate data 2014 with campaign expenditure data 2014***

**Preparing the candidate data 2014 for merge:
clear
cd ".../Brazil_candidates_replication/"

use cand_2014_appended.dta, clear

*Just keeping info for deputados federais
rename ds_cargo office_CD
keep if office_CD=="DEPUTADO FEDERAL"

*Renaming variables (CD abbreviation for candidate data)
rename sq_candidato id_2014
rename sg_uf state
rename nm_candidato name
rename dt_nascimento birthday
rename nm_urna_candidato shortname
rename ds_cor_raca ethnicity_CD
rename nr_idade_data_posse age_CD
rename ds_estado_civil civilstatus_CD
rename sg_partido party_CD
rename ds_genero gender_CD
rename ds_grau_instrucao education_CD
rename sq_coligacao list_id

keep office_CD id_2014 state name birthday shortname ethnicity_CD age_CD /*
*/ civilstatus_CD party_CD gender_CD education_CD list_id

* Format birthday as date
gen double daily = date(birthday, "DMY")
format daily %td
drop birthday
rename daily birthday

save "cand_2014.dta", replace

**Merge with campaign expenditure (prepared in Brazil_append_candidates.do):

merge 1:1 id_2014 state using "despesas_2014_appended.dta"

drop if _merge==2
drop _merge

save "cand_2014_campexp.dta", replace


**Merging with the election data.
merge 1:1 id_2014 state using "vote_2014.dta"

*drop if not in candidate and vote file
drop if _merge!=3
drop _merge

*Check for duplicates in different combinations of state, name, shortname and birthday
duplicates report state id_2014

*Add state codes
encode state, gen(state_nr)
label variable state "State"
label variable state_nr "State"

*Create dummies
tab ethnicity_CD, missing
drop if ethnicity_CD=="NÃO DIVULGÁVEL"
tab ethnicity_CD, gen(ethnicity_)
rename ethnicity_1 cand_asian
rename ethnicity_2 cand_white
rename ethnicity_3 cand_indigenous
rename ethnicity_4 cand_brown
rename ethnicity_5 cand_black
label variable cand_asian "Asian"
label variable cand_white "White"
label variable cand_indigenous "Indigenous"
label variable cand_brown "Brown"
label variable cand_black "Black"

*Encode and label
encode ethnicity_CD, gen(ethnicity)
label variable ethnicity "Ethnicity"
label define methnicity 1 "Asian" 2 "White" 3 "Indigenous" 4 "Brown" 5 "Black"
label values ethnicity methnicity

encode party_CD, gen(party)
label variable party "Party"

encode education_CD, gen(education)
recode education 5=1 2=2 1=3 4=4 3=5 7=6 6=7 
label variable education "Level of education"
label define meducation 1 "Reads and writes" 2 "Basic, incomplete" /*
*/ 3 "Basic, complete" 4 "Medium, incomplete" 5 "Medium, complete" /*
*/6 "Superior, incomplete" 7 "Superior, complete"
label values education meducation

encode civilstatus_CD, gen(civil)
label variable civil "Civil status"
label define mcivil 1 "Married" 2 "Divorced" 3 "Separated" 4 "Single" 5 "Widow"
label values civil mcivil

encode gender_CD, gen(gender)
label variable gender "Gender"

rename age_CD age
gen age2=age^2
label variable age "Age"
label variable age2 "Age squared"

*Create logged vote share variable
bys state: egen tot_cand_votes_state=total(votes_CED)
gen votes_cand_rel=votes_CED/tot_cand_votes_state
gen Ln_votes_cand=ln(votes_cand_rel)
label variable Ln_votes_cand "Logged candidate vote share"

*Camp exp
*Assume that missing = 0 (such as Jansuz, test plausibility later)
replace camp_exp_CE=0 if camp_exp_CE==.
gen camp_exp_k=camp_exp_CE/1000
gen camp_exp_k2=camp_exp_k^2
label variable camp_exp_k "Campaign expenditure in 1000s"
label variable camp_exp_k2 "Campaign expenditure squared"

*Generate variable for population shares for sorting in the graphs
gen  white_pop= 27.11  if  state=="AC"
replace  white_pop= 28.64  if state=="AL"
replace  white_pop= 20.8  if state=="AM"
replace  white_pop= 21.4  if  state=="AP"
replace  white_pop= 21.08  if state=="BA"
replace  white_pop= 33.98  if state=="CE"
replace  white_pop= 42.23  if  state=="DF"
replace  white_pop= 41.35  if state=="ES"
replace  white_pop= 42.95  if state=="GO"
replace  white_pop= 22.77  if  state=="MA"
replace  white_pop= 45.94  if state=="MG"
replace  white_pop= 51.59  if state=="MS"
replace  white_pop= 37.19  if  state=="MT"
replace  white_pop= 21.24  if state=="PA"
replace  white_pop= 36.08  if state=="PB"
replace  white_pop= 35.26  if  state=="PE"
replace  white_pop= 24.9  if state=="PI"
replace  white_pop= 69.06  if state=="PR"
replace  white_pop= 50.23  if  state=="RJ"
replace  white_pop= 43.32  if state=="RN"
replace  white_pop= 36.69  if state=="RO"
replace  white_pop= 23.95  if  state=="RR"
replace  white_pop= 81.72  if state=="RS"
replace  white_pop= 85.75  if state=="SC"
replace  white_pop= 25.65  if  state=="SE"
replace  white_pop= 63.68  if state=="SP"
replace  white_pop= 26.43  if state=="TO"
label variable white_pop "White population share (%)"

*Label remaining relevant variables
label variable office_CD "Office"
label variable id_2014 "TSE 2014 candidate identifier"
label variable name "Full name"
label variable shortname "Ballot name"

*drop remaining variables
drop party_CD gender_CD education_CD civilstatus_CD ethnicity_CD camp_exp_CE

*Drop special characters to avoid inconsistencies in spelling of names
cleanchars _all, in(Á Ã Â Ç É Ê Í Ô Ó Õ Ú) out(A A A C E E I O O O U) vval

save "cand_2014_campexp_vote.dta", replace

*Continue with Brazil_candidates_incumbency.do
